Software for Annotating Argument Structure

نویسندگان

Wojciech Skut

Brigitte Krenn

Thorsten Brants

Hans Uszkoreit

چکیده

We present a tool developed for annotating corpora with argument structure representations. The presentation focuses on the architecture of the annotation scheme and a number of techniques for increasing the efficiency and accuracy of annotation. Among others, we show how the assignment of grammatical functions can be automatised using standard part-of-speech tagging methods. 1 T h e A n n o t a t i o n S c h e m e Several features of the tool have been introduced to suite the requirements imposed by the architecture of the annotation scheme (cf. (Skut et al., 1997)), which can itself be characterised as follows: • Direct representation of the underlying argument structure in terms of unordered trees; • Rudimentary, flat representations; uniform treatment of local and non-local dependencies; • Extensive encoding of linguistic information in grammatical function labels. Thus the format of the annotations is somewhat different from treebanks relying on a context-free backbone augmented with trace-filter annotations of non-local dependencies. (cf. (Marcus et al., 1994), (Sampson, 1995), (Black et al., 1996)) Nevertheless, such treebanks can also be developed using our tool. To back this claim, the representation of structures from the SUZANNE corpus (cf. (Sampson, 1995)) will be shown in the presentation. 2 U s e r I n t e r f a c e A screen dump of the tool is shown in fig. 1. The largest part of the window contains the graphical representation of the structure being annotated. The nodes and edges are assigned category and grammatical function labels, respectively. The words are numbered and labelled with part-of-speech tags. Any change into the structure of the sentence being annotated is immediately displayed. Extra effort has been put into the development of a convenient keyboard interface. Menus are supported as a useful way of getting help on commands and labels. Automatic completion and error check on user input are supported. Three tagsets have to be defined by the user: partof-speech tags, phrasal categories and grammatical functions. They are stored together with the corpus, which permits easy modification when needed. The user interface is implemented in Tc l /Tk Version 4.1. The corpus is stored in an SQL database. 3 A u t o m a t i o n To increase the efficiency of annotation and avoid certain types of errors made by the human annotator, manual and automatic annotation are combined in an interactive way. The automatic component of the tool employs a stochastic tagging model induced from previously annotated sentences. Thus the degree of automation increases with the amount of data available. At the current stage of automation, the annotator determines the substructures to be grouped into a new phrase and assigns it a syntactic category. The assignment of grammatical functions is performed automatically. To do this, we adapted a standard part-of-speech tagging algorithm (the best sequence of grammatical functions is to be determined for a sequence of syntactic categories, cf. (Skut et al., 1997)) The annotator supervises the automatic assignment of function tags. In order to keep him from missing tagging errors, the grammatical function tagger is equipped with a function measuring the reliability of its output. On the basis of the difference between the best and second-best assignment, the prediction is classified as belonging to one of the following certainty intervals: Re l iab le : the most probable tag is assigned, Less re l iab le : the tagger suggests a function tag; the annotator is asked to confirm the choice,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hindi Syntax: Annotating Dependency, Lexical Predicate-Argument Structure, and Phrase Structure

This paper describes a treebanking project for Hindi/Urdu. We are annotating dependency syntax, lexical predicate-argument structure, and phrase structure syntax in a coordinated and partly automated manner. The paper focuses on choices in syntactic representation, and the stages we think are most appropriate for annotating differnt types of information.

متن کامل

GLML: Annotating Argument Selection and Coercion

In this paper we introduce a methodology for annotating compositional operations in natural language text, and describe a mark-up language, GLML, based on Generative Lexicon, for identifying such relations. While most annotation systems capture surface relationships, GLML captures the “compositional history” of the argument selection relative to the predicate. We provide a brief overview of GL ...

متن کامل

The Effect of Dynamic Assessment of Toulmin Model through Teacher- and Collective-Scaffolding on Argument Structure and Argumentative Writing Achievement of Iranian EFL Learners

Considering the paramount importance of writing logical arguments for college students, this study investigated the effect of dynamic assessment (DA) of Toulmin model through teacher- and collective-scaffolding on argument structure and overall quality of argumentative essays of Iranian EFL university learners. In so doing, 45 male and female Iranian EFL learners taking part in the study were r...

متن کامل

Software Review: Protein Family Alignment Annotation

For bioscientists studying protein structure and function, the Protein Family Alignment Annotation Tool (Pfaat) is a useful and simple program for annotating collections of proteins. This open-source software includes methods for viewing and aligning protein families, and for annotating sequence structure and residues with known functions. It offers new options to aid the study of proteins, and...

متن کامل

Annotating Predicate-Argument Structure for a Parallel Treebank

Abstract We report on a recently initiated project which aims at building a multi-layered parallel treebank of English and German. Particular attention is devoted to a dedicated predicate-argument layer which is used for aligning translationally equivalent sentences of the two languages. We describe both our conceptual decisions and aspects of their technical realisation. We discuss some select...

متن کامل